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Comment: Struggles with Survey 
Weighting and Regression Modeling 

Sharon L. Lohr 



In the ideal samples of survey sampling textbooks, 
weights are the inverses of the inclusion probabili- 
ties for the units. But nonresponse and undercover- 
age occur, and survey statisticians try to compen- 
sate for the resulting bias by adjusting the sampling 
weights. There has been much debate about when 
and whether weights should be used in analyses, and 
how they should be constructed. Professor Gelman 
deserves thanks for clarifying the discussion about 
weights and for raising interesting issues and ques- 
tions. 

If we use weights in estimation, what would we 
like them to accomplish? Here are some desirable 
properties: 



1. 



2. 



The mean squared error (MSE) of estimators is 
smaller if the weights are used than if the weights 
are not used. 

Estimators produced using the weights are inter- 
nally consistent. Thus, if Yi is the estimated total 
medical expense for men in the population, Y2 is 
the estimated total medical expense for women 
in the population and I3 is the estimated total 
medical expense for everyone in the population, 
then Yi + Y2 = Y3. 

We may have independent population counts from 
a census or administrative data source for sex, 
age, race/ethnicity and other variables. If we ap- 
ply the weights to estimate these quantities, the 
estimates equal the true population counts. We 
refer to this as the calibration property. 
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4. The weight for unit i in the sample can be thought 
of as the number of population units represented 
by unit i. 

5. The estimators have optimal properties under su- 
perpopulation models that are thought to fit the 
data. 

6. The estimators are robust to misspecifications of 
the superpopulation models. 

7. The procedure for constructing the weights is ob- 
jective and transparent. 

All of these are good properties. The problem is 
that one can only rarely construct a set of weights 
that satisfies all of them simultaneously. 

In this discussion, we distinguish between design 
weights and weighting adjustments used for post- 
stratification. The design weights are 



di 
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P(unit i included in sample) 

The design weight di is a property of unit i; un- 
der design-based inference, it is a fixed constant. 
If two samples are drawn independently using the 
same probability sampling design and if each sam- 
ple includes unit i, the weight di for the unit is the 
same in each sample. Poststratification weight ad- 
justments, however, depend on the selected sample 
S. In the simplest case of ratio adjustment, we multi- 
ply each sampling weight di by the factor gi{S,x) = 
X/X, where X is the known population total of 
auxiliary variable x and X = J2i£S diXi. The result- 
ing weight is Wi{S,x) = digi{S,x); the weight de- 
pends on the sample selected and on the auxiliary 
variable x through the estimated total X . The ra- 
tio estimator of the population total is then Y^ = 
{X/X)J2iesdiyi = J2iesWi{S,x)yi. Similarly, for 
generalized regression estimation. 



5i(5,x) = l + (X-X) 



where the scaling constant q may depend on x. 
For the special case of poststratification, gi{S,x) = 
Nc/Nc for observation i in poststratification class c. 
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Thus, for poststratification, the weight adjustments 
are positive; for general regression models, however, 
the weight adjustments are unrestricted. The weight 
'Wi{S,x) varies from sample to sample. Since the 
weight adjustment depends only on x, though, and 
not on y, the weight Wi{S,x) will be the same for 
every response variable used in that sample. 

The weights proposed by Gelman using hierarchi- 
cal models add dependence on y to the mix. When 
a proper prior is used for f3 and the parameters in 
the covariance matrix are estimated from the data, 
the weights change if a different sample is taken and 
they also change if a different response variable is 
used; we denote this dependence on the response 
variable by expressing the weights as Wi{S,x,y). 

Now let us look at how various weighting schemes 
address the properties listed above. 

1. Reducing the mean squared error. The ratio- 
nale for any kind of weighting adjustments is that es- 
timators constructed using the weights should have 
smaller MSE than estimators constructed without 
using the weights. If the response variable has the 
same mean in each of the poststrata, then poststrat- 
ification weights decrease efficiency since they in- 
crease the variance of the estimator (Kish, 1992). 
Korn and Graubard (1999, Chapter 4) show that 
even a few weights that differ greatly from the oth- 
ers can substantially increase the variance. But if 
the poststrata have different means, weights often 
decrease the bias of the estimator, particularly when 
there is substantial undercoverage or nonresponse. If 
there is a strong relationship between the response 
variable y and the auxiliary information x, then 
using poststratification or regression modeling to 
adjust the weights can also decrease the variance; 
in the best possible case with y proportional to x, 
^jg^ x)yj = Y which has variance 0. One rea- 
son to poststratify is to try to compensate for non- 
response or undercoverage in certain poststrata so 
that potential biases are reduced. The population 
may have 100,000 urban residents and 30,000 rural 
residents. Because of nonresponse, a simple random 
sample may end up with 1100 urban residents and 
200 rural residents, so that urban residents are over- 
represented in the sample. If the urban and rural ar- 
eas have different means, an unweighted estimator 
is biased for estimating the population mean. 

In hopes of reducing the variance due to dispar- 
ities in the weights and reducing the influence of 
observations with large weights, various researchers 
have proposed shrinkage methods for the weights. 



Smoothed weights also help protect the confiden- 
tiality of the data. Traditionally, statistical agencies 
have collapsed poststrata, or trimmed weight adjust- 
ments gi that are too large. The resulting weights 
depend on S and x, and usually do not depend on 
y, but are difficult to justify from an optimality per- 
spective and have an ad hoc quality that some find 
disturbing. 

Stokes (1990) shrinks the weights using an empiri- 
cal Bayes approach. Elliott and Little (2000) shrink 
the weights using mixed models. Gelman's proce- 
dure smooths the weights by using a hierarchical 
model to shrink the regression parameter estimates. 
These procedures have desirable properties under 
the models used, but give weights that depend on 
y as well as S and x. Thus, a different set of weights 
would be used for each response variable. 

2. Internal consistency. To have internal consis- 
tency, the weights need to be the same for each re- 
sponse variable. Weights of the form Wi{S, x, y) that 
depend on the response variable, such as those re- 
sulting from the hierarchical regression approach of 
Gelman's paper, can, as he points out, lead to in- 
ternally inconsistent estimators. In addition, multi- 
variate statistics are affected if different weights are 
used for different variables. 

Alexander (1991), discussing papers on whether 
to use weights in regression models based on survey 
data, asked: "Are we really to use weighted results 
for some parts of a report and unweighted results 
for others?" A similar question can be asked here: 
Are we really to use one set of weights to estimate 
unemployment, another set of weights to estimate 
poverty, and yet another set to estimate the rela- 
tion between poverty and unemployment? I think 
that for official statistics, internal consistency is very 
important and therefore weights that do not depend 
on y are preferred. 

One possibility for obtaining internal consistency 
is to obtain one set of shrinkage weights that is then 
used for all variables. Chambers (1996) and Rao and 
Singh (1997) proposed using ridge-regression meth- 
ods to shrink the weights. These methods depend 
only on the x variables and not on y. 

3. Calibration. Often calibration is desirable so 
that demographic counts and other quantities will 
be consistent across surveys, and be consistent with 
the census. Poststratified weights satisfy the calibra- 
tion property. Shrinkage weights, in general, do not, 
although some methods are closer to satisfying it 
than others. 
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Calibration is often deemed more important for 
larger classifications than for finer classifications. For 
example, it might be considered important for the 
weighted counts to equal the population counts for 
sex X ethnicity categories, but less important for 
sex X ethnicity x age categories. Statistical agen- 
cies implicitly order the importance of calibration 
classifications when they devise procedures for col- 
lapsing poststratification cells. 

In regression models for weights, exact calibration 
may be achieved by using a noninformative prior for 
all components of /3, that is, using classical regres- 
sion. One possible variation of using hierarchical re- 
gression for constructing weights might be to use the 
prior distribution of (3 to try to control the degree of 
calibration for different variables. However, as soon 
as an informative prior is introduced for any com- 
ponent of (3, the calibration for the other variables 
can disappear. More research is needed on how the 
weights can be smoothed yet calibrate to the most 
important population quantities. 

4. Weight as population units represented by sam- 
ple unit. The design weights di are commonly thought 
of as the number of population units represented by 
unit i in the sample. The weights x) from post- 
stratification can be thought of in the same way, as 
long as the adjustment is not too extreme. In some 
cases, though, the adjusted weight can be less than 
1, which would lead to the interpretation that the 
sampled unit represents only a fraction of a unit in 
the population — that is, the sampled unit does not 
even represent itself. 

Deville and Sarndal (1992) point out that weights 
from generalized regression estimators can be neg- 
ative, which presents even more problems for inter- 
pretation. Weights can be negative in the regression 
models in (7) and (9) of Gelman's paper when some 
interactions are omitted from the model. Thus if the 
regression model used to construct the weights has 
main effects terms for sex and ethnicity but does not 
contain the interaction term, it is easy to construct 
examples in which the weights of some observations 
are negative even though weights for females sum 
to the population count for females and the weights 
for blacks sum to the population count for blacks. 
Negative weights can be awkward to explain and are 
unacceptable for many users. 

5. Model-based properties. Estimators that have 
been proposed have good properties under super- 
population models that generate them. This is im- 
portant, particularly when the models are fit to ex- 
plore relationships among the variables. Gelman 



rightly points out that an important issue is how 
to tell when we can have confidence in the regres- 
sion coefficients from the model, particularly when 
many covariates are included. 

6. Robustness. Holt and Smith (1979) emphasize 
robustness as one of the virtues of poststratifica- 
tion. Robustness is of course the big concern in any 
model-based weighting scheme, particularly when 
nonresponse or undercoverage occur since then one 
cannot check that the model holds for nonrespon- 
dents. 

Some private survey organizations now take con- 
venience samples and reweight the data in an at- 
tempt to generalize to the population (Schonlau, 
Fricker and Elliott, 2002). The accuracy of any pop- 
ulation estimate from this method, such as the es- 
timated percentage of people who think the govern- 
ment should provide health insurance for all chil- 
dren, then depends entirely on the model underlying 
the weighting scheme. If that model does not hold 
for individuals outside the sample, then the popula- 
tion estimates have unknown quality. 

The tree of weights in Gelman's Figure 2 is a won- 
derful tool for studying the weights that result from 
various models. Figure 2 makes it clear that the big 
difference in the weight variability occurs in the ex- 
ample studied when education categories are added 
to the weighting model. Other trees could be drawn 
when the factors for weighting are considered in a 
different order, or when robust regression methods 
are used to estimate the parameters. 

7. Objectivity and perceived fairness of weighting 
procedure. In most areas of statistics we strive for 
estimators with low MSE under reasonable models. 
In surveys, however, we want estimators that meet 
additional criteria. Because surveys are used for of- 
ficial statistics, those statistics should be accept- 
able by all participants in policy debates. Alexander 
(1994) emphasized that a survey statistician should 
be able to defend the choice of estimator to "politi- 
cians," defining "politician" as "anyone who is hop- 
ing to see your survey yield a particular result." 

We should be able to defend the procedure used 
to develop weights to politicians and nonpoliticians 
alike. Any procedure used to construct weights will 
include subjective judgments — which variables to in- 
clude in a model or how to construct and combine 
weighting adjustment cells. But persons who con- 
struct weights using models with the y variable need 
to be especially careful that the models do not bias 
official statistics. This means careful attention to 
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variable selection and influential observations, and 
exploration of a range of models. 

Gelman's paper begins with the statement "Sur- 
vey weighting is a mess." I do not think that survey 
weighting is a mess, but I do think that many peo- 
ple ask too much of the weights. For any weighting 
problem, one should begin by defining which of the 
possibly contradictory goals for the weights are de- 
sirable. Social scientists interested primarily in rela- 
tionships among variables may value optimality un- 
der the model above all other features. I think that 
internal consistency of estimators and transparency 
of the weight construction method are essential for 
official statistics. Gelman's thought-provoking and 
informative paper made me think in a new way 
about weights, and I look forward to his future con- 
tributions to this discussion. 
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